[Task Manager] Log at different levels based on the state#101751
Merged
chrisronline merged 24 commits intoelastic:masterfrom Jun 16, 2021
Merged
[Task Manager] Log at different levels based on the state#101751chrisronline merged 24 commits intoelastic:masterfrom
chrisronline merged 24 commits intoelastic:masterfrom
Conversation
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Relates to #101505
This PR introduces logic that will change how we log the monitoring stats to the Kibana server log:
Currently, we write a debug log entry every time an event is pushed into the stream (not every as we utilize throttling) which is helpful, if verbose logging is configured by the user. More commonly, users do not have this configured (as it does involve seeing a lot of noise) so this logging paradigm has limited uses (a user would need to know there was a problem, restart Kibana with the config change and then observe the metrics - assuming the problem happens regularly)
This PR changes that by writing to a different log level based on a few things:
runtime,configuration, andworkload). If thestatusisWarning, then we log as a warning. If thestatusisError, we log as an error.stats.runtime.value.drift.p99, is above a configurable threshold (which defaults to1mfor no particular reason so any insight here would be great) - if this happens, we log as a warning.This will help ensure these metrics are written to the logs when task manager is under performing and will give valuable insight into the why.